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Confidence  Bands  in  Straight  Line  Regression 

by 

A.  V.  Gafarian^ 

ABSTRACT 


This  paper  develops  a  method  for  obtaining  confidence  bands  in  polynomial 
regression  when  the  observations  are  independently  distributed  with  constant  but 
unknown  variance.  The  bands  may  be  obtained,  in  principle,  over  arbitrary  sets 
of  the  independent  variable  with  exact  preassigned  confidence  coefficients.  In 
general,  difficult  distribution  problems  result  when  specific  applications  are 
attempted.  The  major  portion  of  this  paper  is  concerned  with  first  degree 
polynomials  since  some  progress  has  been  made  here.  A  table  is  provided  to 
obtain  a  constant  width  confidence  band  which  contains  the  true  but  unknown 
straight  regression  line  for  values  of  the  independent  variable  in  some  arbitrarily 
selected  interval  with  an  exact  preassigned  confidence  coefficient.  The  present 
method  is  compared  with  the  classical  hyperbolic  band  for  the  whole  regression 
line. 
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1.  INTRODUCTION  AND  SUMMARY 

The  basic  problem  considered  in  this  paper  is  the  following.  Suppose  for 

2 

every  te(-o°,«0,  Y  is  a  normal  random  variable  with  unknown  variance  <7  and  mean 
value  mt  given  by  a  polynomial  of  known  degree  r  >  1  and  unknown  coefficients. 
Let  I  be  a  subset  of  interest  in  (-00,00).  Based  on  mutually  independent  obser¬ 
vations  it  is  desired  to  construct  simultaneous  confidence  intervals  for  m^, 
tel,  with  preassigned  probability  1-a.  It  should  be  pointed  out  that  the 
material  discussed  here  is  close  to  methods  called  "multiple  comparisons 11  in 
other  contexts. 

A  well  known  result  occurs  when  the  set  I  contains  only  one  point,  Graybill 
[l,  pp.  121-122].  It  must  be  emphasized  that  if  intervals  are  computed  by  that 
technique  for  every  t,  no  confidence  statement  may  be  made  about  the  resulting 
band  (a  hyperbola  for  r»l)  containing  the  unknown  regression  line,  i.  e. ,  that 
method  does  not  provide  simultaneous  coverage  of  the  ordinates  of  the  regression 
line.  Less  known  is  the  work  of  Working  and  Hotelling  [2]  in  which  a  hyperbolic 
confidence  band  is  obtained  for  the  whole  regression  line  when  it  is  assumed 
the  variance  is  known.  The  method  is  easily  extended  to  the  unknown  variance 
case  and  provides  a  hyperbolic  band  valid  for  the  whole  regression  line,  Scheffe 
[3,  pp.  52,53].  Hoel  [4]  extends  the  method  of  Working  and  Hotelling  for  the 
straight  line  regression  in  such  a  way  as  to  make  it  possible  to  find  an 
optimum  confidence  band.  Hie  optimum  band  is  defined  to  be  that  band  of  an 
admissible  class  of  bands  such  that  its  expected  total  area  is  a  minimum.  Also, 
in  [4]  the  case  of  polynomial  regression  of  degree  two  or  higher  is  considered 
and  a  procedure .  similar  to  the  first  degree  case  is  outlined.  However,  in  these 
cases  the  confidence  bands  possess  confidence  coefficients  >1-0. 
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The  present  study  was  undertaken  to  extend  some  of  the  results  described 
above.  Ordinarily  an  experimenter  is  not  interested  in  coverage  of  the  whole 
regression  curve.  On  the  contrary,  interest  lies  in  only  a  bounded  interval 
or  even  a  finite  set  of  points.  The  restriction  of  the  above  described  bands 
to  bounded  sets  of  interest  yield  confidence  coefficients  >  1  -  a  (even  in  the 
first  degree  case).  A  method  for  providing  a  band  that  is  valid  only  for  the 
set  of  interest  may  yield  a  more  efficient  band.  Secondly,  it  would  be 
desirable  to  maintain  a  uniform  degree  of  accuracy  over  the  set  of  interest, 
i.e. ,  the  width  of  the  band  is  the  same  for  all  values  of  the  independent 
variable  t  in  the  set  of  interest. 

This  paper  develops  a  general  method  for  obtaining  confidence  bands  of 
arbitrary  shape  and  over  any  arbitrary  subset  of  the  line  when  the  observations 
are  independently  normally  distributed.  The  shape  is  arbitrary  in  the  sense 
that  if  w  is  any  positive  function  defined  over  the  subset  I  of  interest  in 

then  the  width  of  the  band  for  tel  is  proportional  to  w(t).  Thus,  by 
selecting  w(t)  «  1,  t€l,  the  resulting  band  has  the  same  width  for  every  t€l. 

In  general,  difficult  distribution  problems  result  when  specific  applica¬ 
tions  are  attempted.  The  major  portion  of  this  paper  is  concerned  with  first 
degree  polynomials  since  some  progress  has  been  made  here.  A  table  is  provided 
to  obtain  a  confidence  band  which  contains  the  true  regression  line  for  values 
of  the  independent  variable  in  an  arbitrarily  selected  interval  of  interest 
[a,b]  with  an  exact  confidence  coefficient.  The  band  has  the  same  width  for 
all  values  tc[a,b].  The  table  is  constructed  for  use  in  the  following  situa¬ 
tion:  (1)  The  sample  size  n  is  even;  (2)  If  observations  are  made  at  the  values 
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—  I  n  a+b 

tl,t2,**'>tn  °f  the  indePencient:  variable  then  t  ■  —  ^E^t^  “  ”  •  Defining 
[A,B]  as  the  interval  in  which  observations  are  permissible  a  best  solution 
obtains  if  in  addition  (3)  ,  i.  e.  ,  the  observation  interval  [A,B] 

is  symmetrically  located  with  respect  to  the  interval  of  interest  [a,b];  (4) 

(2)  is  realized  by  making  half  the  observations  at  A  and  half  at  B.  The 
solution  is  best  in  the  sense  that  for  a  given  n,  (B-A)/(b-a),  and  probability 
of  coverage  this  particular  experimental  configuration  achieves  the  smallest 
bandwidth. 

The  important  feature  of  the  band  provided  by  the  present  method  is  that 
it  is  uniformly  wide  over  [a,b].  In  order  to  get  some  idea  of  its  efficiency 
it  was  compared  to  the  band  that  arises  by  merely  considering  the  restriction 
of  the  hyperbolic  one  to  the  interval  [a,b],  though  in  this  case  the  probability 
of  coverage  is  no  longer  1-a  but  >1-0.  The  comparison  was  made  in  terms  of 
the  areas  of  the  bands.  To  be  more  specific  for  a  given  n,  (B-A)/(b-a),  and 
probability  of  coverage,  the  best  band  (i.  e. ,  minimum  area)  was  computed  by 
the  present  method.  The  experimental  configuration  to  achieve  this  also 
provides  the  minimum  area  over  [a,b]  for  the  hyperbolic  band.  The  ratio  of 
the  two  areas  was  then  considered  as  a  measure  of  the  efficiency.  Roughly, 
the  result  is  that  for  (B-A)/(b-a)  >  3/2  the  present  method  is  more  efficient 
and  for  (B-A)/(b-a)  <  3/2  the  restriction  of  the  hyperbolic  band  to  [a,b] 
yields  smaller  areas.  More  specific  calculations  will  be  presented  in  a  later 
section  of  the  paper. 
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2.  GENERAL  TECHNIQUE 

Suppose  that  for  every  te(-«,oo)^  y  is  a  normal  random  variable  with  unknown 

variance  c and  expectation  given  by  a  polynomial  +  P^t  +  ...  +  g^t*  of 

unknown  coefficients  and  known  degree  r.  Let  iC  (-00,00)  be  the  set  of  interest. 

For  preassigned  confidence  coefficient  1-a  and  positive  function  w  defined  on  I 

it  is  desired  to  obtain  simultaneous  confidence  intervals  for  E[Yfc]  ■  m  ,  tel, 

such  that  the  length  of  the  interval  for  each  tel  is  proportional  to  w(t). 

Suppose  independent  observations  are  made  at  the  time  points 

where  the  number  of  distinct  observation  points  is  >  r  +  1  and  the  number  of 

2 

observations  is  >  r  +  1  (this  ensures  that  cr  may  be  estimated  since  only  r  +  1 
distinct  points  are  needed  for  the  estimability  of  the  linear  parameters).  Let 
01  ■  ($q  ...  $  )  denote  the  vector  of  least  squares  estimates  for 

P’  -  (Po  ...  Pr)  given  by 

p  -  (t,t)~1t'y 

where 


and  Y*  -  (y^  y^  ...  yfl)  is  the  vector  of  observations  at  the  points  t^, •  •  * > t  . 
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Denote  by 


fr2  -  ^71  (y-tp)'(y-tp) 


the  independent  unbiased  estimator  of  cr  based  on  n-r-1  degrees  of  freedom. 

fS  be  the  best  linear  estimate  of  m.  given  by  ,£  $ttK  From  the* function 
t  t  j-o  j 

~  mt 

w(t)0 

and  for  any  pair  of  numbers  (8^,82)  with  8^  <  82  let 

6i<Tci)it<  v  KI} 


in  the  space  of  the  random  vector  whose  distribution  is  parameter  free 

and  calculable  [5].  These  are  sufficient  conditions  to  obtain 


[fit  -  52w(t)fr,  -  5xw (t)W,  tel 


as  simultaneous  confidence  intervals  of  confidence  coefficient  P[V(6^,52)] 
[6].  The  width  of  the  band  for  any  tel  is  (52-5^)  w(t)a. 

To  insure  the  existence  of  at  least  one  pair  (&i>§2)  to  acquire  the 
probability  1-a,  an  additional  restriction  must  be  imposed  on  the  function  w 
The  set  V(5^,52)  may  be  written  as 


n 

tel 


< 


-  m 
t  ~  t 

w(t)(T 


Let 
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Since 


-  m 
w(t)a 


t 


w(t) 


(1 1 


tr) 


it  follows  that  each  set  in  the  above  intersection  consists  of  the  points  between 
two  parallel  hyperplanes  which  are  perpendicular  to  (1  t  ...  tr)  1  and  are  at 
distances  (w(t) \b2 | ) /(j£q I t | h and  (w(t) | | ) /(j£Q | t | from  the  origin. 

Hence,  if  there  exist  constants  m  >  0  and  M  >  0  such  that  m  <  w(t) 1 1 1  <  M 

for  tel  then  and  only  then  does  there  exist  a  pair  (actually  many  pairs) 

such  that  the  required  probability  is  attained. 

It  is  conjectured  that  optimum  confidence  intervals  are  obtained  whenever 
Is  taken  >  0  and  8^  «  -82*  The  optimum  is  in  the  sense  that  for  a  given  con¬ 
fidence  coefficient  1-a  the  difference  82"$]^  and  hence  the  length  of  the  con¬ 
fidence  intervals,  will  be  minimized.  This  conjecture  is  based  on:  (1)  The 

A 

fact  that  the  density  function  for  the  random  vector  ^ 7 ^  is  constant  on  con- 

<j 

centric  (r+1)  -  dimensional  ellipsoids  with  center  at  origin  and  decreases 
monotonely  with  distance  from  the  origin,  and  (2)  The  set  V(a^,a2)  in  this 
situation  is  symmetrical  with  respect  to  the  origin  and  probably  has  a  maximum 
volume  for  any  fixed  difference 

It  should  be  emphasized  again  that  the  real  difficulty  here  is  the  calcula¬ 
tion  of  8^  and  &2  to  achieve  probability  1-a  when  any  specific  applications  are 
attempted.  Progress  has  been  made  for  the  case  r«l,  I  an  interval,  and  w(t)  -  1 
for  tel,  i.  e. ,  a  band  which  has  the  same  width  over  the  interval  of  interest. 

The  major  portion  of  the  remainder  of  the  paper  is  devoted  to  this  problem.* 
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However,  for  some  special  examples  the  general  case  specializes  properly  to  well 
known  results.  E. g. ,  if  I  is  a  single  point  only,  say  tQ,  w(tQ)  ■  1,  6^  ■  -Sj# 
and  >  0,  it  can  be  shown  that 


ta  1^1 

— ;n-r-l 


t')(t'D"l(i 


where  t  is  the  upper  a/2  point  of  a  t-variable  with  n-r-1  degrees  of* 

f;n-r-l 

freedom,  so  that 


P[M'o  •  82®  £  jI0V°  £  jMj'i  *  62S1  ■  - 

[l,  p.  122].  Similarly,  consider  the  set  of  all  linear  combinations 

^oUo  +  +  PrUr:  ^Uo  ui  •••  ur)€  Er+1^  Setting  &1  «  -$2  and  &2  >  0  and 

defining  w  for  any  (uq  ...  u^)  to  equal 


[(u  u.  ...  u  )(t’t)_1(u  u.  ...  u  )'] 
N  o  1  r/N  ,Nol  r' 


1/2 


gives  that 


&2  =  <r+1>  Fa;r+1,  n-r-1  ’ 


where  F  .  .  is  the  upper  a  point  of  a  F-variable  with  r+1  and  n-r-1 

n*r* i 

degrees  of  freedom.  This  then  gives 

•  Ui“il  £  <t+‘>  Fa; r+1,  n-r-1 

X«u0  U1  "•  urHT'T)‘l(u0  Uj  ...  ur) ') 1/2  S;<u0ti1  ...  ur)eEr+1I  -  1-a 
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« 


[6].  An  infinite  subset  of  the  above  intervals  is  then  a  confidence  band  of 
confidence  coefficient  >  1  -  a  for  the  mean  curve.  For  r«l  this  gives  a  band 
for  the  whole  line  with  exact  confidence  coefficient  1-a.  A  little  calculation 
shows  this  to  be  the  hyperbolic  band  referred  to  in  Section  1. 

3.  STRAIGHT  LINE  REGRESSION 

This  section  contains  the  analysis  in  detail  of  the  straight  line  regression 
case.  For  convenience  the  regression  line  is  written  in  the  form 


®t  * 

—  i  n 

where  t  *  —  i5ltii  n  ^  2.  The  t^’s  are  observation  points  such  that  at  least 
two  are  distinct.  The  observation  at  t^  is  denoted  by  y^.  It  is  supposed  that 
observations  may  be  made  only  in  an  interval  [A,B]  and  that  a  uniformly  wide  con¬ 
fidence  band  is  required  for  the  interval  [a,b],  i. e. ,  w(t)  *  1  for  te[a,b]. 
Proceeding  as  outlined  in  Section  2,  form  the  function 


a  a 


where 

K’n  i-lyi' 

2  ii(trI)yi 

P1  "  n  -  2  > 

*<*!-*> 

a2  i  n  .  /s/n  —  „  o 
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are  stochastically  independent.  Determine  for  8  >  0 


/  Pi  -  Pl\  P-  •  Pn  Pi  -  Pi  _  1 

VC-6, 8)  -  -j  ( - ,  V  y  -6  <  °-e~  °  +  -M(t-t)  <  6,te[a,b]j, 


or  equivalently  the  image  of  V(-8,  5)  in  the  plane  of  t-variables  of  n-2 

degrees  of  freedom 


rV'.  r  h  '  h 

u  «vn  - - -  ,  v  ■  vns  - 


A  ) 
CJ 


where 


2 

s 


2 


The  resulting  set  is  a  parallelogram  and  is  shown  in  Fig.  1.  The  density  func¬ 
tion  g  of  (u,v)  is  given 


Figure  1 
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by 


g(u,v) 


1_ 

2« 


1  + 


u2  + 


-2l  '  2 


n-2 


From  symmetry  of  the  density  function  we  need  to  consider  only  the  probability 

of  the  triangle  T(-5,5)  in  the  upper  half  plane. 

An  examination  of  Fig.  1  illustrates  the  fact  that  for  a  fixed  [a,b],  [A,B],  6, 

and  n,  different  values  of  t  and  s  result  in  different  confidence  coefficients. 

The  problem  of  maximizing  the  confidence  coefficient  is  now  investigated. 

For  a  given  t  the  claim  is  that  the  confidence  coefficient  is  maximized 

when  the  variance  of  the  observation  points  is  maximized.  For  t  such  that  the 

apex  of  the  triangle  is  in  [-sTab,  >Tn &],  this  is  clear  from  the  fact  that  if 

2  2 

s2  >  81  are  t*ie  var*ances  two  configurations  with  corresponding  triangles 
T2(-5>§)  and  T^(-5,5)  respectively,  then  T2(-6,5)  2)T^(-S,5).  If  t  is  such 
that  the  apex  of  TC-b^S)  lies  in  the  complement  of  [-\Tn8,  Vn8],  it  is  not 
patently  clear  that  the  probability  increases  with  s.  That  this,  however,  is 
the  case  is  shown  as  follows.  Let  h(£,rj)  =  P[T(-5,5)],  where  |  and  rj  are  the 
u  and  v  coordinates  of  the  apex.  Then 


h(4>n) 


/dv  r  du j- r 

1+U2+V2l 

i dv  J  ^  2.  L 

C  /-V  /  t  Y>  \ 

1+  n-2  J 

2  n 


0^(4, Ti)  -  v  -  5  ■'/n  , 

L ~  v  +  bJn  . 


where 
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Hence 


2*n2  l-hd,:,)  =  6(1!  -  I2)  +  2/n5  +  1^  , 


5n 


where 


[i  '/„dv  v  t1 + 


0^(6^)  +  v  1 


2  n 


n-2 


r  f 

/  dv  V 
'  o  *- 

1  +  .-2  J 

2  H 
2 


But  1^  >  for  £  >  0  since 


a^n)  -  c^(6,  n)  =  "^’~v  j^1  -  >  o  ,  o<v<ri 


Thus  for  5  >  0  (and  by  symmetry  for  |  <  0) 


2*  ^  h(|,T])  >  0  . 


This  proves  that  for  a  given  t,  A  <  t  <  B,  the  variance  of  the  n  observation 

points  must  be  maximized.  Intuitively,  this  is  what  one  would  expect. 

It  can  be  shown  (Appendix)  that  for  any  t,  A  <  t  <  B,  the  corresponding 
2 

maximum  s  which  may  be  attained  by  the  observation  points  { t t^, . . . , t^)  is 
(B-A)2  f2(r),  where 


f2(x)  =  k  +  (nx-k) ' 
v  J  n 


k  k+1  .  n  , 

~  <  t  <  — k=0, 1,. . .  ,n-l, 


and 


17  April  1963 


14 


SP-1181/000/00 


4 


The  configuration  of  observation  points  to  obtain  this  maximum  occurs  with  k  t^’s 
at  B,  1  t^  at  n(t-A)  -  k(B-A)  +  A,  and  n  -  (k+1)  t^s  at  A.  Thus  for  a  fixed 
t}  the  maximum  confidence  coefficient  for  the  band  of  width  25$  is  achieved  when 
the  coordinates  of  the  apex  are 

u  -  2  N/n6f(T-e)  , 
v  *  l  vTnSi  f  (t)  , 


where 


i 


B-A 
b-a  9 


4 


and 


e  = 


B-A 


Plots  of  the  loci  of  the  apex  are  shown  in  Figures  2  and  3  for  an  even  and  an 
odd  sample  size  respectively.  Each  section  of  the  curve  corresponds  to  the  range 


k  t-A  k+1 

—  <  T  =  — — r  <  -  , 

n  —  B-A  —  n 


k=0, I, . . . ,n-l  . 


i 
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Figure  3 


Due  to  the  symmetry  of  the  problem,  it  may  always  be  assumed  that  <  e  < 
i.e. ,  the  midpoint  of  the  interval  [a,b]  is  always  to  the  left  of  the  midpoint 
of  [A,B].  Whenever  e  =  ^  ,  i.e.,  [A,B]  and  [a,b]  have  the  same  midpoint.  The 


— *|C\I 
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contours  are  symmetrical  with  respect  to  the  v-axis.  For  e  <  —  ,  the  contours 


are 


shifted  to  the  right  by  the  amount  2^nS  -  e) . 


The  problem  of  choosing  the  best  t  for  a  fixed  [A,B],  [a,b]  n,  and  6  is  now 
considered.  The  best  t  is  defined  as  the  one  that  yields  the  maximum  confidence 
coefficient  when  its  corresponding  maximum  variance  configuration  is  used  (or 
equivalently  minimizes  5  for  a  given  confidence  coefficient  1-0,  [A,b],  [a,b], 
and  n).  Intuitively  one  would  expect  the  best  t  to  be  the  one  whose  correspond¬ 
ing  maximum  variance  configuration  possesses  the  highest  possible  variance  of  the 
observation  points.  This  has  been  proved  for  the  following  situations: 

1.  n  even  and  >  6,  <  e  <  ~  .  First  it  is  shown  that  the  maximum 

confidence  coefficient  must  be  attained  for  some#point  on  the  apex-contour-curve 
between  the  first  peak  to  the  left  of  the  v-axis  and  the  highest  peak  to  the  right 
of  the  v-axis.  This  follows  from  the  fact  that  ^  h(£;q)  >  0,  and  that 


2irn  -  I2  -  Ix  <  0,  5  >  0 


This  last  equation  merely  states  that  the  probability  in  the  triangle  decreases 
as  its  apex  moves  away  from  the  v-axis  along  a  horizontal  line.  Next  observe 
that  each  section  has  an  axis  of  symmetry  for  a  distance  (which  may  be  0,  such 
as  for  the  first  and  last  sections)  on  either  side  of  a  vertical  which  passes 
through  the  point  having  a  horizontal  tangent  (see  the  arc  AB,  k=l,  in  Fig.  2). 
Hence  if  the  v-axis  intersects  ayy  section  to  the  right  of  the  axis  of  symmetry, 
the  maximum  probability  of  a  triangle  whose  apex  lies  anywhere  on  the  section 
occurs  when  the  apex  is  at  or  to  the  right  of  the  v-axis.  This  is  the  situation 
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1  n-2  l 

(see  Fig.  2)  when  -(^y)  <  e  <  —  ,  which  means  that  the  v-axis  intersects  the 

first  section  somewhere  on  the  arc  CE. 

Now  as  the  apex  moves  from  the  v-axis  toward  the  peak,  the  probability  in 

the  triangle,  which  is  one  half  the  confidence  coefficient,  increases.  This 

t-A 

follows  by  writing  the  confidence  coefficient  as  a  function  of.  T  =*  in  the 
iterated  integral 


<pCO  v  (t) 


-  2 

f  dv 

r  duf 

2  2" 
i  +  u  V 

a 

2 

Jo 

J  2*  L 

n-2 

where 


2l(T-e)  +  1  v  .  sTng 

O  if  v  v  nD  > 


V  (r)  =  ^  7  ±  V 

1V '  2if(T) 


V2(T)  =  1  v  +/n6  ’ 


Cp(x)  =  2  >/" n8if  (t) 


Differentiating  with  respect  to  x  gives 


2rtif2(T)  Ij-  P[V*  (-6,8)1  =  [T«n-l)e-k)  +  £(1  +  £) -ke]  ’  (x)  (J^) 


where 


2  *  .  2  n 

v,(t)  +  V  -1  -  - 


r  r  +  v  i  -  ? 

J1  BJ  dv  V  1  + - n-2  J 

u  o  L  J 
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J 


2 


<P(t) 
dv  v 


L 


1  + 


2  2 
v^CO  +  v 

n^2 


n 

2 


As  the  apex  moves  from  the  v-axis  toward  the  peak  (along  arc  DE  on  Fig.  2 ),  t 
varies  from  e  to  j  ,  k  =  |  -  1,  f'CO  >  0,  and  J2  -  <  0.  But  for  n«6,8,10,... 

the  coefficient  of  J2  "  Ji  is  <  0  along  the  arc  DE  and  hence  P[V’(-5,5)]  >  0. 
This  means  that  the  maximum  confidence  coefficient  is  attained  when  the  apex  of 
the  triangle  is  at  the  point  E. 

2.  n  odd  and  >  3,  <  e  <  77  .  In  this  case  the  v-axis  lies  somewhere 

—  7  zn  —  —  Z 

on  arc  EG,  say  F,  Fig.  3.  The  maximum  probability  is  then  on  arc  EF.  As  the 
apex  moves  from  E  to  F,  t  varies  from  to  e,  k  =  f ’(T)  <  an<^  ^2  "  ^1  — 

But  the  coefficient  of  J2  -  is  <  0  along  EG  and  hence  ^  P[V’(-6,  5)1  <  0.  Thus 
the  maximum  confidence  coefficient  occurs  when  the  apex  of  the  triangle  is  at  E. 


3.  n  odd  and  >  7,  rr  "jv  <  e  <  .  Now  the  v-axis  would  lie  on  CE,  say 

—  9  2(n-l)  -  -  2n  ’  J 

D,  in  Fig.  3,  and  the  maximum  probability  would  lie  somewhere  on  arc  DE.  As  the 
apex  moves  from  D  to  E,  t  varies  from  e  to  ,  k  =  >  fJ(T)  >  0,  and  ^ 

's 

<  0.  But  the  coefficient  of  is  <  0  along  DE  and  ^  P[V  (-5,8)]  >  0,  i.  e.  , 

the  maximum  confidence  coefficient  occurs  at  E. 


4.  TABLE 


From  the  above  it  is  seen  that  in  general,  the  maximum  confidence  coefficient 

a+b  A 
B-A  2  ~  ^ 

for  a  band  of  width  25<r  depends  on  the  parameters  i  ®  ^j*,  e  =  — —  ,  and  n. 


Hence,  a  table  which  could  handle  all  possible  experimental  situations  would 
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have  to  contain  the  value  of  the  confidence  coefficient  for  a  range  of  values  of 
the  parameters  5,  l,  e,  and  n.  This  seemed  too  extensive  an  undertaking  at  this 
time. 

The  table  presented  in  this  paper  is  constructed  for  use  in  the  following 
situation: 


(1)  n  even,  specifically,  n  »  4(2)20(10)30(20)50, 

mb 

(2)  t  ■  — —  ,  so  that  an  optimum  solution  is  possible  only  if  -y  =  — y 
Hence  the  problem  is  essentially  to  compute  the  integral  of  the  function 


g(u,v)  =  ^ 


1  + 


u  +  V 
n-2 


2vjn 


over  the  triangle  shown  in  Fig.  4. 


Figure  4 
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» 


The  table  consists  of  13  pages.  At  the  top  of  each  page  are  listed  two  values 

ic 

of  a  number  c  =  1(.  1)2(.  2)  3(.  4)  5(1)  6(2)  10(10)  20,  00 .  When  the  maximum  variance 

configuration  is  used,  i.  e.  ,  ^  observations  at  A  and  —  observations  at  B, 

B-A 

c  =  =  £.  If  any  other  configuration  of  observation  points  is  used,  still 

—  a+b  2  g 

maintaining  t  =  — r— ,  then  c  -  r —  where  s  is  the  variance  of  the  observation 
£  D-a 

points.  For  each  value  of  c,  the  confidence  coefficient  is  computed  for  all 
combinations  of  n  =  4(2)20(10)30(20)50,  °°  and  d  =  \fn&  =  1(.  05)2.  5 (.  1)  4(.  2) 5 (.  5) 
7(1)10(5)20(10)50.  The  confidence  coefficient  is  entered  into  the  body  of 
the  table  without  a  decimal  point.  Each  entry  is  correct  to  3  significant 
figures  and  a  blank  space  corresponds  to  a  rounding  off  to  1. 

It  should  be  noted  that  the  table  is  not  restricted  to  those  values  of 
c  >  1.  Because  of  the  symmetry  of  the  density  function  g,  it  follows  that  for 
any  c  <  1,  the  table  with  heading  1/c  may  be  used.  In  this  case  the  values  in 
the  column  d  =  \/~n&  must  be  multiplied  by  1/c. 

This  table  was  computed  using  an  expression  derived  by  a  technique  similar 
to  that  of  Dunnett  and  Sobel  [5].  The  confidence  coefficient  1-a  may  be  written 
as 


*/2  r  (0) 


Jd-a)  =  17  f  *e  r dp(i^2) 

j  n  n 


n 

2 


*  .  ^  -lc 
2  +  tan  c 


=  *7  -  r  dq)[l  -i-  k2  csc2cp] 

4  2  it  -lr 

°  tan  c 


I  +  1 


The  table  is  composed  of  computer  print-out  and  thus  the  letters  c,n,  and  d 
appear  as  capitals. 
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where 


0  2  2 
o2  =  u _  v _  =  & 

1  n-2  n-2 


02 

Vn-2 


sin  \|r 
sin (044)  ' 


_  -1  v 

0  =  tan  —  , 


-1 

^  *  tan  c  y 


2  2 

.2  8  c  n 

k  =  - 


(n-2) (1+c2)  ’ 
2  s 

b-a  ' 


Cp  =  0  +  t 


Define 


~  +  tan 


dcp[  1  +  k2  csc2cp]  2 

2  tan 


-S+i 


and  consider 


|  +  can-1c 


V  \  , 
2  2 


=  -  J'  dcp[  1  +  k2  csc2cp]  2  k2  csc2cp 


tan  c 


Making  use  of  the  change  of  variable 


1  +  ^1  +  tan2cp 


it  is  seen  after  some  calculation  that 
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4 


\‘Qn  , 

2  2 


_  kq+ic) 


2.”  2(”'» 


4jt 


{Bf1(c,k)_2  '  2(n"3)]  +  Bf2(c,k)'2’I<n‘3>  } 


where 


fx(c,k) 


(i + S}‘ 


1  + 


f2(c,k) 


+  (l  +  $72 

z 

and  [p, q]  -  J tP"1(l  -t)q  *dt  is  the  incomplete  beta  function. 


Now  for  the  case  that  n  is  odd  and  >  3 

1-a  -  1  -  4Qn 
2 


■  • «.  >+«*, 


■  Q  )  +  ...+«-  Q  )  +  Q  ) 
—  l  —  i  n  9  j.  A  3 

2  2  '  1  2  ~  L  2  ~  ^  2  2  2 


But 


%  -  &  h"  rr 


-I-  sin 


J l+(l+nS2)c2  J l+(l+n52)c2- 


so  that  finally,  in  terms  of  the  incomplete  beta  function  ratio  I^fp^q] 
Bz[p,q]/B1[p,q], 
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1-5  -  1  -  \  Tsin’1  -ssss  — 


l+(l+n82)c2 


+  8  in 


-1 


J  l+(l+n62)c2j 


j(n-3) 


2k r  y  jdlLadiif  A  (i.1 

*  *■  (l+k2)j(2j-l)I  '  fl(c'k)L2  J 


+  If2(c,k)L2'j 


(1) 


n-5, 7,9,. . . 


2  f  .  -1  1 

'*Lsn  JI+  (W)c 


sin_1  _ c 

'/l+(l+n62)cZ  ■/  L+(l+n62) 


?]’  * 


:3  . 


The  formula 


iz(pj>  -vTz  y  p1*1  (l-z1) 

i»o  4  <ll> 


is  used  for  calculating  the  incomplete  beta  function  ratios  in  (1). 
For  n  even  and  >  4 


1-a  »  1-4 

2 


-  %  >  +  %  ■%  >  + 

2  2  2  2  "  2 


+  (Q2-Qi>  + 


But  ^  .  Hence  after  some  calculation 


l-a  -  k 


z 


(2 j-2)  1 


j-1  (l+k2)j'2[(j-l)l]24 


j-i  CIfi(c/k)  ,2'J-2  +  Xf2(c,k)  ?j'2  )  <2) 
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The  formula 


j-2 


1z(-bi~b  "  \ tan  1  ^  (2i+i>T(1"z) 

i-0 


is  used  for  evaluating  the  incomplete  beta  function  ratios  appearing  in  (2). 

The  actual  computations  were  performed  on  a  Philco  2000  digital  computer 
using  equations  (1)  and  (2).  For  n  <  50,  which  is  the  range  of  finite  n  in  the 
table,  an  error  analysis  showed  that  the  resulting  probabilities  could  be  off  at 
most  by  seven  digits  in  the  7th  place.  To  reduce  the  size  of  the  table,  however, 
these  were  rounded  off  to  three  figures.  This  should  be  sufficient  for  most 
applications. 

Now 


lim  g(u, v) 

n-K» 


v2) 


(3) 


which  is  the  uncorrelated  bivariate  normal  distribution  with  zero  means  and  unit 
variances.  To  make  the  calculation  for  n=°o,  which  amounts  to  the  integral  of 
(3)  over  the  triangle  of  Fig.  4,  a  method  outlined  by  Owen  [7]  was  used.  For 
1  <  c  <  »  this  gives 


1-a  -  1-4(E+F) 


where 


E  .  T  (x,  i)  , 
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F  =  \  G(x)+G(y)  -  G(x)G(y)  +  l(y,  ^  , 


i-0 


x  -  -P2 

G(x)  -  £  f.  2  <S  ■ 

-00 

For  c*«,  E*0  and 

F  -  ■|'[l-G(x)  ] 

where 

x  *  (5  s/n)  . 

These  again  were  performed  on  the  Philco  2000  and  the  computations  were  such 
that  the  resulting  confidence  coefficients  are  correct  to  three  significant 
figures. 

One  additional  observation  is  that,  as  c+»  for  a  fixed  b'Tn,  the  confidence 


17  April  1963 


26 


SP-1181/000/00 


coefficient  is  the  area  of  the  function  g  over  an  infinite  strip  parallel  to  the 
v-axis.  Hence,  the  values  in  the  table  with  c-«  could  have  been  obtained  from  a 
t-table.  Each  column  corresponds  to  a  t-variable  whose  degrees  of  freedom  is 
two  less  than  the  sample  size  heading. 


5.  EFFICIENCY 

In  Scheffe  [3,  pp.  52,  53],  it  is  seen  that  a  1-a  confidence  band  for  the 
true  line  consists  of  all  points  (t,y)  satisfying 


t,-s-|(t-t)]2<r  . 

ns  -j 


This  gives  a  band  about  the  fitted  line,  bounded  by  the  two  branches  of  a 
hyperbola.  In  order  to  use  this  for  comparison  purposes  with  the  method  of 
this  paper,  it  is  restricted  to  just  the  interval  [a,b].  The  confidence  co¬ 
efficient  of  this  band  is,  of  course,  no  longer  1-a  but  >  1-a 

The  area  of  the  hyperbolic  band  over  the  interval  [a,b]  is  given  by 


A 


1 


2<J  n/2F 


a;2,n-2 


1 

2 


*“*  a+b  2 

It  is  clear  that  this  area  is  minimized  when  t  *  -j—  and  s^  is  maximized.  Thus 

if  *  —p,  s2  is  maximized  for  n  even  when  ^  observations  are  at  A  and  j 

2  1  2 

observations  are  at  B.  In  this  case  s  «  ^(B-A)  .  Thus 
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A1  '  ,K:2.,-2  f 


J  2Fa;2,n-2  [(l  +  4)  ’  +  ' 


j  l+c2+l' 


B*A  A 

where  c  «  r —  .  The  area  A-  of  our  band  is  25cr(b-a).  Hence,  the  ratio  A- /A., 
b-a  2  12 

which  will  be  referred  to  as  the  efficiency  of  our  method,  is  given  by 


A1  N2Fa;2,n-2  1  [7  lY  ^l-HcZ+ll 

A1+7J +cl0g— J- 


>/TZX. 


(6/n) 


Eq.  (4)  is  valid  for  any  0  <  c  <  “.  It  was  noted  in  Section  4  that  lim(B'/n) 

C“H» 

ca.  •  But 

2  y  ^"2 

^7+i 

lim  c  log  -  —  =  1  . 

C-K»  C 


Hence 


A.  V2F  . 
, .  1  a;n-2 

Urn  t  =  — — * - 

c-*°  A2  ca  _ 

<5  >n"2 


The  symmetry  of  the  function  g  means  that  lim  c6  */n  *  t  .  Also 

c-K)  §;n-2 

2  ,  ^C2+1+1  _ 

lim  c  log  ■  ■  0 

C-K)  c 


Hence 
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lim 


rim 
C+&  A, 


1  1  ^2FQ:ri-2 

~  m  2 


Eqs.  (4),  (5),  and  (6)  summarize  the  results  of  this  section.  Fig.  5  is  a 
graph  of  the  efficiency,  for  each  of  three  values  of  n,  as  a  function  of  c. 
confidence  coefficient  selected  is  .95. 


(6) 


The 
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o 

* 

• 

o 

IO 

1.20 

on 

8  8 

o  o  o 

0  N  9 

2W 

/'v  =  A0N3l0ldd3 

Figure 
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APPENDIX 


Let  [z^z^, . . .  ,z^)  be  n  points  in  the  unit  interval  [0/l] 
1  n  r  . 

z  -  -  .£.z,  in  LO. 1J  the  problem  is  to  maximize 
n  l-i  l 

2  n  —2  n  2  —2 

ns  “  i£i(zi"z>  “  i^izi  ‘  nz  • 

The  claim  is  that,  to  maximize  set  each  z ^  equal  to  0  or  to  1, 
n  — 

keeping  ^izi  “  nz  *  For  suPPose;  without  loss  of  generality, 
Then  there  exists  6  >  0  such  that 

0  <  zL  -  5<  1  , 

0  <  z2  +  6  <  i  , 

and 

(Zl-6)2  +  (z2+6)2  -  z2+z2+262  +  >  zj+z2  , 

2 

i.  e.  ,  ns  may  be  increased.  The  actual  configuration  for  any 

—  fc+i  _ 

<  2  <  k«0, 1,. .  .  ,n-l  is  k  z^s  at  1,  1  z  at  nz-k,  and  n 

The  resulting  maximum  variance  is 

k+(nl-k)2  _  -2 
n 


For  a  fixed 


except  for  one, 
0  <  z^  <  z^  <  1, 


z  such  that 
-(k+1)  z^*s  at  0. 


Thus  for  {t^,  t2, . , , ,  t^}  C  [A,B],  the  maximum  variance  configuration  for  a  fixred 
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*  < 


n 


t-A  k+1 
B-A  -  n  * 


k*0, 1>  •  •  • } n-1  / 


is  given  by  k  t^’s  at  B,  1  t^.  at  n(t-A)  -  k(B-A)  +  A,  and  n-(k+l)  ti’s  at  A. 
The  maximum  variance  is 


(B-A)' 


k+(m-kV 
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